Time to absorption in discounted reinforcement models

نویسندگان

  • Robin Pemantle
  • Brian Skyrms
چکیده

Reinforcement schemes are a class of non-Markovian stochastic processes. Their non-Markovian nature allows them to model some kind of memory of the past. One subclass of such models are those in which the past is exponentially discounted or forgotten. Often, models in this subclass have the property of becoming trapped with probability 1 in some degenerate state. While previous work has concentrated on such limit results, we concentrate here on a contrary effect, namely that the time to become trapped may increase exponentially in 1/x as the discount rate, 1−x, approaches 1. As a result, the time to become trapped may easily exceed the lifetime of the simulation or of the physical data being modeled. In such a case, the quasi-stationary behavior is more germane. We apply our results to a model of social network formation based on ternary (three-person) interactions with uniform positive reinforcement.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Continuous-Time Hierarchical Reinforcement Learning

Hierarchical reinforcement learning (RL) is a general framework which studies how to exploit the structure of actions and tasks to accelerate policy learning in large domains. Prior work in hierarchical RL, such as the MAXQ method, has been limited to the discrete-time discounted reward semiMarkov decision process (SMDP) model. This paper generalizes the MAXQ method to continuous-time discounte...

متن کامل

A New Bi-Objective Model for a Multi-Mode Resource-Constrained Project Scheduling Problem with Discounted Cash Flows and four Payment Models

The aim of a multi-mode resource-constrained project scheduling problem (MRCPSP) is to assign resource(s) with the restricted capacity to an execution mode of activities by considering relationship constraints, to achieve pre-determined objective(s). These goals vary with managers or decision makers of any organization who should determine suitable objective(s) considering organization strategi...

متن کامل

Effect of Steel Confinement on Behavior of Reinforced Concrete Frame

The strength and ductility of concrete improve under multi-axial compressive stress due to confinement effect. Some parameters are effective for considering the confinement in concrete and various stress-strain models were developed by different researchers. Longitudinal and transverse reinforcement steels can influence on confinement in reinforced concrete members. In this paper, various stres...

متن کامل

Renewal Monte Carlo: Renewal theory based reinforcement learning

In this paper, we present an online reinforcement learning algorithm, called Renewal Monte Carlo (RMC), for infinite horizon Markov decision processes with a designated start state. RMC is a Monte Carlo algorithm and retains the advantages of Monte Carlo methods including low bias, simplicity, and ease of implementation while, at the same time, circumvents their key drawbacks of high variance a...

متن کامل

Model-Based Average Reward Reinforcement Learning

Reinforcement Learning (RL) is the study of programs that improve their performance by receiving rewards and punishments from the environment. Most RL methods optimize the discounted total reward received by an agent, while, in many domains, the natural criterion is to optimize the average reward per time step. In this paper, we introduce a model-based Average-reward Reinforcement Learning meth...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003